67 research outputs found
PrObeD: Proactive Object Detection Wrapper
Previous research in object detection focuses on various tasks,
including detecting objects in generic and camouflaged images. These works are
regarded as passive works for object detection as they take the input image as
is. However, convergence to global minima is not guaranteed to be optimal in
neural networks; therefore, we argue that the trained weights in the object
detector are not optimal. To rectify this problem, we propose a wrapper based
on proactive schemes, PrObeD, which enhances the performance of these object
detectors by learning a signal. PrObeD consists of an encoder-decoder
architecture, where the encoder network generates an image-dependent signal
termed templates to encrypt the input images, and the decoder recovers this
template from the encrypted images. We propose that learning the optimum
template results in an object detector with an improved detection performance.
The template acts as a mask to the input images to highlight semantics useful
for the object detector. Finetuning the object detector with these encrypted
images enhances the detection performance for both generic and camouflaged. Our
experiments on MS-COCO, CAMO, CODK, and NCK datasets show improvement
over different detectors after applying PrObeD. Our models/codes are available
at https://github.com/vishal3477/Proactive-Object-Detection.Comment: Accepted at Neurips 202
Unsupervised Green Object Tracker (GOT) without Offline Pre-training
Supervised trackers trained on labeled data dominate the single object
tracking field for superior tracking accuracy. The labeling cost and the huge
computational complexity hinder their applications on edge devices.
Unsupervised learning methods have also been investigated to reduce the
labeling cost but their complexity remains high. Aiming at lightweight
high-performance tracking, feasibility without offline pre-training, and
algorithmic transparency, we propose a new single object tracking method,
called the green object tracker (GOT), in this work. GOT conducts an ensemble
of three prediction branches for robust box tracking: 1) a global object-based
correlator to predict the object location roughly, 2) a local patch-based
correlator to build temporal correlations of small spatial units, and 3) a
superpixel-based segmentator to exploit the spatial information of the target
frame. GOT offers competitive tracking accuracy with state-of-the-art
unsupervised trackers, which demand heavy offline pre-training, at a lower
computation cost. GOT has a tiny model size (<3k parameters) and low inference
complexity (around 58M FLOPs per frame). Since its inference complexity is
between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge
devices
GUSOT: Green and Unsupervised Single Object Tracking for Long Video Sequences
Supervised and unsupervised deep trackers that rely on deep learning
technologies are popular in recent years. Yet, they demand high computational
complexity and a high memory cost. A green unsupervised single-object tracker,
called GUSOT, that aims at object tracking for long videos under a
resource-constrained environment is proposed in this work. Built upon a
baseline tracker, UHP-SOT++, which works well for short-term tracking, GUSOT
contains two additional new modules: 1) lost object recovery, and 2)
color-saliency-based shape proposal. They help resolve the tracking loss
problem and offer a more flexible object proposal, respectively. Thus, they
enable GUSOT to achieve higher tracking accuracy in the long run. We conduct
experiments on the large-scale dataset LaSOT with long video sequences, and
show that GUSOT offers a lightweight high-performance tracking solution that
finds applications in mobile and edge computing platforms
Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints
Ensuring the realism of computer-generated synthetic images is crucial to
deep neural network (DNN) training. Due to different semantic distributions
between synthetic and real-world captured datasets, there exists semantic
mismatch between synthetic and refined images, which in turn results in the
semantic distortion. Recently, contrastive learning (CL) has been successfully
used to pull correlated patches together and push uncorrelated ones apart. In
this work, we exploit semantic and structural consistency between synthetic and
refined images and adopt CL to reduce the semantic distortion. Besides, we
incorporate hard negative mining to improve the performance furthermore. We
compare the performance of our method with several other benchmarking methods
using qualitative and quantitative measures and show that our method offers the
state-of-the-art performance
- …